Skip to content

Conversation

@anzr299
Copy link
Collaborator

@anzr299 anzr299 commented Jan 12, 2026

Changes

The approach here is quite straightfoward, _quantize_weights works as usual for 2D weights. The difference is in calculate hessian where the hessian is 3D in both 3D and 2D weights case. By default hessian has the shape (1, hidden_dim, hidden_dim).
Before this was just (hidden_dim, hidden_dim). For 3D, it is (num_experts/batch, hidden_dim, hidden_dim).

Now, this 3D hessian or "batched" hessian is looped over and the 2D weight is extracted and passed to the old _quantize_weights function as usual and scale/zp are returned. These scales and zp are then stacked together in a collector variable. For 2D case, it is flattened. For 3D the stacked scale, zp are returned.

NOTE: Scale Estimation + GPTQ support is not added for 3D weights yet

Reason for changes

Support 3D weights for models like MoE in GPTQ

Related tickets

175789 & 175212

Tests

@anzr299 anzr299 requested a review from a team as a code owner January 12, 2026 08:35
@anzr299 anzr299 marked this pull request as draft January 12, 2026 08:35
@github-actions github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant